The five systems of the CLUSTER dataset.


1.The description of the CLUSTER dataset 

 (1) *CLASS_relationInfo_whole.ser*: The whole code dependence relation. You can use it to get code regions by setting thresholds of call dependence and data dependence. 

 (2) *merged-code-region*: Merge all class files in a code region. You can use it to get code regions' IR values to requirements . 
         Eg:"CacheUtils_9.txt"(Maven),'CacheUtils' is one class name of this code region. '9' means the code region contains 9 class files.

 (3) *normalized-uc*:  The documents of both requirements and classes are normalized by standard pre-processing techniques including splitting identifiers, special token elimination, stemming, and stop word removal.

 (4) *RTM_CLASS*: Requirements-to-code Trace Matrices.


2. Links to download source code of five systems

 (1) iTrust: https://sourceforge.net/projects/itrust/files/itrust/13.0/

 (2) Maven: https://github.com/apache/maven
 
 (3) Pig: https://github.com/apache/pig

 (4) Infinispan: https://github.com/infinispan/infinispan

 (5) GanttProject: https://github.com/bardsoftware/ganttproject



